******************
Query Optimization
******************

Updating Statistics
===================

With the **UPDATE STATISTICS ON** statement, you can generate internal statistics used by the query processor. Such statistics allow the database system to perform query optimization more efficiently. ::

    UPDATE STATISTICS ON { table_spec [ {, table_spec } ] | ALL CLASSES | CATALOG CLASSES } [ ; ]
    
    table_spec ::=
    single_table_spec
    | ( single_table_spec [ {, single_table_spec } ] )
    
    single_table_spec ::=
    [ ONLY ] table_name
    | ALL table_name [ ( EXCEPT table_name ) ]

*   **ALL CLASSES** : If the **ALL CLASSES** keyword is specified, the statistics on all the tables existing in the database are updated.

When starting and ending an update of statistics information, NOTIFICATION message is written on the server error log. You can check the updating term of statistics information by these two messages.

::
    
    Time: 05/07/13 15:06:25.052 - NOTIFICATION *** file ../../src/storage/statistics_sr.c, line 123  CODE = -1114 Tran = 1, CLIENT = testhost:csql(21060), EID = 4
    Started to update statistics (class "code", oid : 0|522|3).

    Time: 05/07/13 15:06:25.053 - NOTIFICATION *** file ../../src/storage/statistics_sr.c, line 330  CODE = -1115 Tran = 1, CLIENT = testhost:csql(21060), EID = 5
    Finished to update statistics (class "code", oid : 0|522|3, error code : 0).

.. note::

    In 2008 R4.3 or before and in 9.1, statistics information of all indexes was updated when an index was added, and it brought a high system load. However, from 2008 R4.4 and 9.2, only statistics information of an added index is created.

Checking Statistics Information
===============================

You can check the statistics Information with the session command of the CSQL Interpreter. ::

    csql> ;info stats table_name
    
*   *table_name* : Table name to check the statistics Information

The following shows the statistical information of *t1* table in CSQL interpreter.

.. code-block:: sql

    CREATE TABLE t1 (code INT);
    INSERT INTO t1 VALUES(1),(2),(3),(4),(5);
    CREATE INDEX i_t1_code ON t1(code);
    UPDATE STATISTICS ON t1;

::

    ;info stats t1
    CLASS STATISTICS
    ****************
     Class name: t1 Timestamp: Mon Mar 14 16:26:40 2011
     Total pages in class heap: 1
     Total objects: 5
     Number of attributes: 1
     Attribute: code
        id: 0
        Type: DB_TYPE_INTEGER
        Minimum value: 1
        Maximum value: 5
        B+tree statistics:
            BTID: { 0 , 1049 }
            Cardinality: 5 (5) , Total pages: 2 , Leaf pages: 1 , Height: 2

Viewing Query Plan
==================

To view a query plan for a CUBRID SQL query, you can use following methods.

*   Press "show plan" button on CUBRID Manager or CUBRID Query Browser. For how to use CUBRID Manager or CUBRID Query Browser, see `CUBRID Manager Manual <http://www.cubrid.org/wiki_tools/entry/cubrid-manager-manual>`_ or `CUBRID Query Browser Manual <http://www.cubrid.org/wiki_tools/entry/cubrid-query-browser-manual>`_.

    .. image:: /images/query_plan_on_CM.png

*   Change the value of the optimization level by running ";plan simple" or ";plan detail" on CSQL interpreter, or by using the **SET OPTIMIZATION** statement. You can get the current optimization level value by using the **GET OPTIMIZATION** statement. For details on CSQL Interpreter, see :ref:`csql-session-commands`.

**SET OPTIMIZATION** or **GET OPTIMIZATION LEVEL** syntax is as following.

::

    SET OPTIMIZATION LEVEL opt-level [;]
    GET OPTIMIZATION LEVEL [ { TO | INTO } variable ] [;]

*   *opt-level* : A value that specifies the optimization level. It has the following meanings.

    *   0: Does not perform query optimization. The query is executed using the simplest query plan. This value is used only for debugging.
    
    *   1: Creates a query plan by performing query optimization and executes the query. This is a default value used in CUBRID, and does not have to be changed in most cases.
    
    *   2: Creates a query plan by performing query optimization. However, the query itself is not executed. In general, this value is not used; it is used together with the following values to be set for viewing query plans.
    
    *   257: Performs query optimization and outputs the created query plan. This value works for displaying the query plan by internally interpreting the value as 256+1 related with the value 1.
    
    *   258: Performs query optimization and outputs the created query plan, but does not execute the query.  That is, this value works for displaying the query plan by internally interpreting the value as 256+2 related with the value 2. This setting is useful to examine the query plan but not to intend to see the query results.
    
    *   513: Performs query optimization and outputs the detailed query plan. This value works for displaying more detailed query plan than the value 257 by internally interpreting the value as 512+1.
    
    *   514: Performs query optimization and outputs the detailed query plan. However, the query is not executed. This value works for displaying more detailed query plan than the value 258 by internally interpreting the value as 512+2.

    .. note:: If you configure the optimization level as not executing the query like 2, 258, or 514, all queries(not only SELECT, but also INSERT, UPDATE, DELETE, REPLACE, TRIGGER, SERIAL, etc.) are not executed.   

The CUBRID query optimizer determines whether to perform query optimization and output the query plan by referencing the optimization level value set by the user. 
    
The following shows the result which ran the query after inputting ";plan simple" or "SET OPTIMIZATION LEVEL 257;" in CSQL.

.. code-block:: sql

    SET OPTIMIZATION LEVEL 257;
    --  csql> ;plan simple
    SELECT /*+ recompile */  DISTINCT h.host_year, o.host_nation
    FROM history h INNER JOIN olympic o 
    ON h.host_year = o.host_year AND o.host_year > 1950;

::    
     
    Query plan:

     Sort(distinct)
        Nested-loop join(h.host_year=o.host_year)
            Index scan(olympic o, pk_olympic_host_year, (o.host_year> ?:0 ))
            Sequential scan(history h)

*   Sort(distinct): Perform DISTINCT.
*   Nested-loop join: Join method is Nested-loop.
*   Index scan: Perform index-scan by using pk_olympic_host_year index about olympic table. At that time, the condition which used this index is "o.host_year > ?".
    
The following shows the result which ran the query after inputting ";plan detail" or "SET OPTIMIZATION LEVEL 513;" in CSQL.

.. code-block:: sql

    SET OPTIMIZATION LEVEL 513;
    --  csql> ;plan detail
    
    SELECT /*+ RECOMPILE */  DISTINCT h.host_year, o.host_nation
    FROM history h INNER JOIN olympic o 
    ON h.host_year = o.host_year AND o.host_year > 1950;

::

    Join graph segments (f indicates final):
    seg[0]: [0]
    seg[1]: host_year[0] (f)
    seg[2]: [1]
    seg[3]: host_nation[1] (f)
    seg[4]: host_year[1]
    Join graph nodes:
    node[0]: history h(147/1)
    node[1]: olympic o(25/1) (sargs 1)
    Join graph equivalence classes:
    eqclass[0]: host_year[0] host_year[1]
    Join graph edges:
    term[0]: h.host_year=o.host_year (sel 0.04) (join term) (mergeable) (inner-join) (indexable host_year[1]) (loc 0)
    Join graph terms:
    term[1]: o.host_year range (1950 gt_inf max) (sel 0.1) (rank 2) (sarg term) (not-join eligible) (indexable host_year[1]) (loc 0)

    Query plan:

    temp(distinct)
        subplan: nl-join (inner join)
                     edge:  term[0]
                     outer: iscan
                                class: o node[1]
                                index: pk_olympic_host_year term[1]
                                cost:  1 card 2
                     inner: sscan
                                class: h node[0]
                                sargs: term[0]
                                cost:  1 card 147
                     cost:  3 card 15
        cost:  9 card 15

    Query stmt:

    select distinct h.host_year, o.host_nation from history h, olympic o where h.host_year=o.host_year and (o.host_year> ?:0 )

On the above output, the information which is related to the query plan is "Query plan:". Query plan is performed sequentially from the inside above line. In other words, "outer: iscan -> inner:scan" is repeatedly performed and at last, "temp(distinct)" is performed. "Join graph segments" is used for checking more information on "Query plan:". For example, "term[0]" in "Query plan:" is represented as "term[0]: h.host_year=o.host_year (sel 0.04) (join term) (mergeable) (inner-join) (indexable host_year[1]) (loc 0)" in "Join graph segments".

The following shows the explanation of the above items of "Query plan:".

    *   temp(distinct): (distinct) means that CUBRID performs DISTINCT query. temp means that it saves the result to the temporary space.
    
        *   nl-join: "nl-join" means nested loop join.
        *   (inner join): join type is "inner join".
        
            *   outer: iscan: performs iscan(index scan) in the outer table.
            
                *   class: o node[1]: It uses o table. For details, see node[1] of "Join graph segments".
                *   index: pk_olympic_host_year term[1]: use pk_olympic_host_year index and for details, see term[1] of "Join graph segments".
                *   cost: a cost to perform this syntax.
                
                    *   card: It means cardinality.
                    
            *   inner: sscan: It performs sscan(sequential scan) in the inner table.
            
                *   class: h node[0]: It uses h table. For details, see node[0] of "Join graph segments".
                *   sargs: term[0]: sargs represent data filter(WHERE condition which does not use an index); it means that term[0] is the condition used as data filter.
                *   cost: A cost to perform this syntax.
                
                    *   card: It means cardinality.
                    
        *   cost: A cost to perform all syntaxes. It includes the previously performed cost.
        
            *   card: It means cardinality.

**Query Plan Related Terms**

The following show the meaning for each term which is printed as a query plan.

*   Join method: It is printed as "nl-join" on the above. The following are the join methods which are printed on the query plan. 

    *   nl-join: Nested loop join
    *   m-join: Sort merge join
    *   idx_join: Nested loop join, and it is a join which uses an index in the inner table as reading rows of the outer table.
    
*   Join type: It is printed as "(inner join)" on the above. The following are the join types which are printed on the query plan.
    
    *   inner join
    *   left outer join
    *   right outer join: On the query plan, the different "outer" direction with the query's direction can be printed. For example, even if you specified "right outer" on the query, but "left outer" can be printed on the query plan.
    *   cross join

*   Types of join tables: It is printed as outer or inner on the above. They are separated as outer table and inner table which are based on the position on either side of the loop, on the nested loop join.

    *   outer table: The first base table to read when joining.
    *   inner table: The target table to read later when joining.

*   Scan method: It is printed as iscan or sscan. You can judge that if the query uses index or not.
    
    *   sscan: sequential scan. Also it can be called as full table scan; it scans all of the table without using an index.
    *   iscan: index scan. It limits the range to scan by using an index.
    
*   cost: It internally calculate the cost related to CPU, IO etc., mainly the use of resources.

*   card: It means cardinality. It is a number of rows which are predicted as selected.
    
The following is an example of performing m-join(sort merge join) as specifying USE_MERGE hint. In general, sort merge join is used when sorting and merging an outer table and an inner table is judged as having an advantage than performing nested loop join. In most cases, it is desired that you do not perform sort merge join.

.. code-block:: sql

    SET OPTIMIZATION LEVEL 513;
    -- csql> ;plan detail

    SELECT /*+ RECOMPILE USE_MERGE*/  DISTINCT h.host_year, o.host_nation
    FROM history h LEFT OUTER JOIN olympic o ON h.host_year = o.host_year AND o.host_year > 1950;
    
:: 

    Query plan:
    
    temp(distinct)
        subplan: temp
                     order: host_year[0]
                     subplan: m-join (left outer join)
                                  edge:  term[0]
                                  outer: temp
                                             order: host_year[0]
                                             subplan: sscan
                                                          class: h node[0]
                                                          cost:  1 card 147
                                             cost:  10 card 147
                                  inner: temp
                                             order: host_year[1]
                                             subplan: iscan
                                                          class: o node[1]
                                                          index: pk_olympic_host_year term[1]
                                                          cost:  1 card 2
                                             cost:  7 card 2
                                  cost:  18 card 147
                     cost:  24 card 147
        cost:  30 card 147

The following performs the idx-join(index join). If performing join by using an index of inner table is judged as having an advantage, you can ensure performing idx-join by specifying **USE_IDX** hint.

.. code-block:: sql

    SET OPTIMIZATION LEVEL 513;
    -- csql> ;plan detail

    CREATE INDEX i_history_host_year ON history(host_year);
    
    SELECT /*+ RECOMPILE */  DISTINCT h.host_year, o.host_nation
    FROM history h INNER JOIN olympic o ON h.host_year = o.host_year;

::

    Query plan:

    temp(distinct)
        subplan: idx-join (inner join)
                     outer: sscan
                                class: o node[1]
                                cost:  1 card 25
                     inner: iscan
                                class: h node[0]
                                index: i_history_host_year term[0] (covers)
                                cost:  1 card 147
                     cost:  2 card 147
        cost:  9 card 147

On the above query plan, "(covers)" is printed on the "index: i_history_host_year term[0]" of "inner: iscan", it means that :ref:`covering-index` functionality is applied. In other words, it does not retrieve data storage additionally because there are required data inside the index in inner table.

If you ensure that left table's row number is a lot smaller than the right table's row number on the join tables, you can specify **ORDERED** hint. Then always the left table will be outer table, and the right table will be inner table.

.. code-block:: sql

    SELECT /*+ RECOMPILE ORDERED */  DISTINCT h.host_year, o.host_nation
    FROM history h INNER JOIN olympic o ON h.host_year = o.host_year;

.. _query-profiling:
 
Query Profiling
===============
 
If the performance analysis of SQL is required, you can use query profiling feature.
To use query profiling, specify SQL trace with **SET TRACE ON** syntax; to print out the profiling result, run **SHOW TRACE** syntax.
 
And if you want to always include the query plan when you run **SHOW TRACE**, you need to add /\*+ RECOMPLIE \*/ hint on the query.

The format of **SET TRACE ON** syntax is as follows.
 
::
 
    SET TRACE {ON | OFF} [OUTPUT {TEXT | JSON}]
 
*   ON: set on SQL trace.
*   OFF: set off SQL trace.
*   OUTPUT TEXT: print out as a general TEXT format. If you omit OUTPUT clause, TEXT format is specified.
*   OUTPUT JSON: print out as a JSON format.
    
As below, if you run **SHOW TRACE** syntax, the trace result is shown.
 
::

    SHOW TRACE;
    
Below is an example that prints out the query tracing result after setting SQL trace ON.
 
::
 
    csql> SET TRACE ON;
    csql> SELECT /*+ RECOMPILE */ o.host_year, o.host_nation, o.host_city, n.name, SUM(p.gold), SUM(p.silver), SUM(p.bronze)  
            FROM OLYMPIC o, PARTICIPANT p, NATION n
            WHERE o.host_year = p.host_year AND p.nation_code = n.code AND p.gold > 10 
            GROUP BY o.host_nation;
    csql> SHOW TRACE;
 
      trace
    ======================
      '
    Query Plan:
      SORT (group by)
        NESTED LOOPS (inner join)
          NESTED LOOPS (inner join)
            TABLE SCAN (o)
            INDEX SCAN (p.fk_participant_host_year) (key range: (o.host_year=p.host_year))
          INDEX SCAN (n.pk_nation_code) (key range: p.nation_code=n.code)

      rewritten query: select o.host_year, o.host_nation, o.host_city, n.[name], sum(p.gold), sum(p.silver), sum(p.bronze) from OLYMPIC o, PARTICIPANT p, NATION n where (o.host_year=p.host_year and p.nation_code=n.code and (p.gold> ?:0 )) group by o.host_nation

    Trace Statistics:
      SELECT (time: 1, fetch: 1059, ioread: 2)
        SCAN (table: olympic), (heap time: 0, fetch: 26, ioread: 0, readrows: 25, rows: 25)
          SCAN (index: participant.fk_participant_host_year), (btree time: 1, fetch: 945, ioread: 2, readkeys: 5, filteredkeys: 5, rows: 916) (lookup time: 0, rows: 38)
            SCAN (index: nation.pk_nation_code), (btree time: 0, fetch: 76, ioread: 0, readkeys: 38, filteredkeys: 38, rows: 38) (lookup time: 0, rows: 38)
        GROUPBY (time: 0, sort: true, page: 0, ioread: 0, rows: 5)
    '
 
On the above, later lines of "Trace Statistics:" are the output of the query trace. The following are the items of trace statistics.

**SELECT**
 
*   time: total estimated time when this query is performed(ms)
*   fetch: total page fetching count about this query
*   ioread: total I/O read count about this query. disk access count when the data is read

**SCAN**

*   heap: data scanning job without index

    *   time, fetch, ioread: the estimated time(ms), page fetching count and I/O read count in the heap of this operation 
    *   readrows: the number of read rows when this operation is performed
    *   rows: the number of result rows when this operation is performed
    
*   btree: index scanning job

    *   time, fetch, ioread: the estimated time(ms), page fetching count and I/O read count in the btree of this operation
    *   readkeys: the number of the keys which are read in btree when this operation is performed
    *   filteredkeys: the number of the keys to which the key filter is applied from the read keys
    *   rows: the number of result rows when this operation is performed; the number of result rows to which key filter is applied
    
*   lookup: data accessing job after index scanning

    *   time: the estimated time(ms) in this operation
    *   rows: the number of the result rows in this operation; the number of result rows to which the data filter is applied

**GROUPBY**    

*   time: the estimated time(ms) in this operation
*   sort: sorting or not
*   page: the number of pages which is read in this operation; the number of used pages except the internal sorting buffer
*   rows: the number of the result rows in this operation

The above example can be output as JSON format.
 
::
 
    csql> SET TRACE ON OUTPUT JSON;
    csql> SELECT n.name, a.name FROM athlete a, nation n WHERE n.code=a.nation_code;
    csql> SHOW TRACE;
    
      trace
    ======================
      '{
      "Trace Statistics": {
        "SELECT": {
          "time": 29,
          "fetch": 5836,
          "ioread": 3,
          "SCAN": {
            "access": "temp",
            "temp": {
              "time": 5,
              "fetch": 34,
              "ioread": 0,
              "readrows": 6677,
              "rows": 6677
            }
          },
          "MERGELIST": {
            "outer": {
              "SELECT": {
                "time": 0,
                "fetch": 2,
                "ioread": 0,
                "SCAN": {
                  "access": "table (nation)",
                  "heap": {
                    "time": 0,
                    "fetch": 1,
                    "ioread": 0,
                    "readrows": 215,
                    "rows": 215
                  }
                },
                "ORDERBY": {
                  "time": 0,
                  "sort": true,
                  "page": 21,
                  "ioread": 3
                }
              }
            }
          }
        }
      }
    }'

.. note:: 

    *   CSQL interpreter which is run in the standalone mode(use -S option) does not support SQL trace feature.

    *   When multiple queries are performed at once(batch query, array query), they are not profiled.
    
On CSQL interpreter, if you use the command to set the SQL trace on automatically, the trace result is printed out automatically after printing the query result even if you do not run **SHOW TRACE;** syntax.

For how to set the trace on automatically, see :ref:`Set SQL trace<set-autotrace>`.

.. _sql-hint:

Using SQL Hint
==============

Using hints can affect the performance of query execution. you can allow the query optimizer to create more efficient execution plan by referring the SQL HINT. The SQL HINTs related tale join, index, and statistics information are provided by CUBRID. ::

    { CREATE | ALTER } /*+ NO_STATS */ { TABLE | CLASS } ...;
        
    { CREATE | ALTER | DROP } /*+ NO_STATS */ INDEX ...;
     
    { SELECT | UPDATE | DELETE } /*+ <hint> [ { <hint> } ... ] */ ...;

    MERGE /*+ <merge_statement_hint> [ { <merge_statement_hint> } ... ] */ INTO ...;
    
    <hint> ::=
    USE_NL [ (spec_name_comma_list) ] |
    USE_IDX [ (spec_name_comma_list) ] |
    USE_MERGE [ (spec_name_comma_list) ] |
    ORDERED |
    USE_DESC_IDX |
    NO_DESC_IDX |
    NO_COVERING_IDX |
    RECOMPILE
    
    <merge_statement_hint> ::=
    USE_UPDATE_INDEX (<update_index_list>) |
    USE_DELETE_INDEX (<insert_index_list>) |
    RECOMPILE

SQL hints are specified by using a plus sign(+) to comments. To use a hint, there are three styles as being introduced on :doc:`comment`. Therefore, also SQL hint can be used as three styles.

* /\*+ hint \*/
* --+ hint
* //+ hint

The hint comment must appear after the **SELECT**, **CREATE**, **ALTER**, etc. keyword, and the comment must begin with a plus sign (+), following the comment delimiter.  When you specify several hints, they are  separated by blanks.

The following hints can be specified in CREATE/ALTER TABLE statements and CREATE/ALTER/DROP INDEX statements.

*   **NO_STATS** : Related to a statistical information hint. If it is specified, query optimizer does not update the statistical information after running the DDL statement. Therefore, the DDL performance  is improved, but note that the query plan is not optimized.

The following hints can be specified in UPDATE, DELETE and SELECT statements.

*   **USE_NL** : Related to a table join, the query optimizer creates a nested loop join execution plan with this hint.
*   **USE_MERGE** : Related to a table join, the query optimizer creates a sort merge join execution plan with this hint.
*   **ORDERED** : Related to a table join, the query optimizer create a join execution plan with this hint, based on the order of tables specified in the FROM clause. The left table in the FROM clause becomes the outer table; the right one becomes the inner table.
*   **USE_IDX** : Related to an index, the query optimizer creates an index join execution plan corresponding to a specified table with this hint.
*   **USE_DESC_IDX** : This is a hint for the scan in descending index. For more information, see :ref:`index-descending-scan`.
*   **NO_DESC_IDX** : This is a hint not to use the descending index.
*   **NO_COVERING_IDX** : This is a hint not to use the covering index. For details, see :ref:`covering-index`.
*   **NO_STATS** : Related to statistics information, the query optimizer does not update statistics information. Query performance for the corresponding queries can be improved; however, query plan is not optimized because the information is not updated.
*   **RECOMPILE** : Recompiles the query execution plan. This hint is used to delete the query execution plan stored in the cache and establish a new query execution plan.

    .. note:: If the *spec_name* is specified together with **USE_NL**, **USE_IDX** or **USE_MERGE**, the specified join method applies only to the *spec_name*. If **USE_NL** and **USE_MERGE** are specified together, the given hint is ignored. In some cases, the query optimizer cannot create a query execution plan based on the given hint. For example, if **USE_NL** is specified for a right outer join, the query is converted to a left outer join internally, and the join order may not be guaranteed.

MERGE statement can have below hints.

*   **USE_INSERT_INDEX** (<*insert_index_list*>) : An index hint which is used in INSERT clause of MERGE statement. Lists index names to *insert_index_list* to use when executing INSERT clause. This hint is applied to  <*join_condition*> of MERGE statement.
*   **USE_UPDATE_INDEX** (<*update_index_list*>) : An index hint which is used in UPDATE clause of MERGE statement. Lists index names to *update_index_list* to use when executing UPDATE clause. This hint is applied to <*join_condition*> and <*update_condition*> of MERGE statement.
*   **RECOMPILE** : Recompile the query execution plan. Use this hint to remove the old query plan and set the new one to the query plan cache.

The following example shows how to retrieve the years when Sim Kwon Ho won medals and the types of medals. Here, a nested loop join execution plan needs to be created which has the *athlete* table as an outer table and the *game* table as an inner table. It can be expressed by the following query. The query optimizer creates a nested loop join execution plan that has the *game* table as an outer table and the *athlete* table as an inner table.

.. code-block:: sql

    SELECT /*+ USE_NL ORDERED  */ a.name, b.host_year, b.medal
    FROM athlete a, game b 
    WHERE a.name = 'Sim Kwon Ho' AND a.code = b.athlete_code;
    
::

      name                    host_year  medal
    =========================================================
      'Sim Kwon Ho'                2000  'G'
      'Sim Kwon Ho'                1996  'G'
      
    2 rows selected.

The following example shows how to retrieve query execution time with **NO_STATS** hint to improve the functionality of drop partitioned table (*before_2008*); any data is not stored in the table. Assuming that there are more than 1 million data in the *participant2* table. The execution time in the example depends on system performance and database configuration.

.. code-block:: sql

    -- without NO_STATS hint
    ALTER TABLE participant2 DROP partition before_2008;

::

    Execute OK. (31.684550 sec) Committed.

.. code-block:: sql
    
    -- with NO_STATS hint
    ALTER /*+ NO_STATS */ TABLE participant2 DROP partition before_2008;

::

    Execute OK. (0.025773 sec) Committed.

.. _index-hint-syntax:

Index Hint
==========

The index hint syntax allows the query processor to select a proper index by specifying the index in the query. You can specify the index hint by USING INDEX clause or by {USE|FORCE|IGNORE} INDEX syntax after "FROM table" clause.

USING INDEX
-----------

**USING INDEX** clause should be specified after **WHERE** clause of **SELECT**, **DELETE** or **UPDATE** statement. **USING INDEX** clause forces a sequential/index scan to be used or an index that can improve the performance to be included.

If **USING INDEX** clause is specified with the list of index names, query optimizer creates optimized execution plan by calculating the query execution cost based on the specified indexes only and comparing the index scan cost and the sequential scan cost of the specified indexes(CUBRID performs cost-based query optimization to select an execution plan).

The **USING INDEX**  clause is useful to get the results in the desired order without **ORDER BY**. When index scan is performed by CUBRID, the results are created in the order they were saved in the index. When there are more than one indexes in one table, you can use **USING INDEX** to get the query results in a given order of indexes.

::

    SELECT ... WHERE ...
    [USING INDEX { NONE | [ ALL EXCEPT ] <index_spec> [ {, <index_spec> } ...] } ] [ ; ]
    
    DELETE ... WHERE ...
    [USING INDEX { NONE | [ ALL EXCEPT ] <index_spec> [ {, <index_spec> } ...] } ] [ ; ]
    
    UPDATE ... WHERE ...
    [USING INDEX { NONE | [ ALL EXCEPT ] <index_spec> [ {, <index_spec> } ...] } ] [ ; ] 
    
    <index_spec> ::=
      [table_spec.]index_name [(+) | (-)] |
      table_spec.NONE
 

*   **NONE** : If **NONE** is specified,  a sequential scan is used on all tables.
*   **ALL EXCEPT** : All indexes except the specified indexes can be used when the query is executed.
*   *index_name*\ (+) : If (+) is specified after the index_name, it is the first priority in index selection. IF this index is not proper to run the query, it is not selected.
*   *index_name*\ (-) : If (-) is specified after the index_name, it is excluded from index selection. 
*   *table_spec*.\ **NONE** : All indexes are excluded from the selection, so sequential scan is used.

USE, FORCE, IGNORE INDEX
------------------------

Index hints can be specified through **USE**, **FORCE**, **IGNORE INDEX** syntax after table specification of **FROM** clause.

::

    FROM table_spec [ <index_hint_clause> ] ...
    
    <index_hint_clause> ::=
      { USE | FORCE | IGNORE } INDEX  ( <index_spec> [, <index_spec>  ...] )
    
    <index_spec> ::=
      [table_spec.]index_name

*    **USE INDEX** ( <*index_spec*> ): Only specified indexes are considered when choose them.
*    **FORCE INDEX** ( <*index_spec*> ): Specified indexes are chosen as the first priority.
*    **IGNORE INDEX** ( <*index_spec*> ): Specified indexes are excluded from the choice.

USE, FORCE, IGNORE INDEX syntax is automatically rewritten as the proper USING INDEX syntax by the system.

Examples of index hint
----------------------

.. code-block:: sql

    CREATE TABLE athlete2 (
       code             SMALLINT PRIMARY KEY,
       name             VARCHAR(40) NOT NULL,
       gender           CHAR(1),
       nation_code      CHAR(3),
       event            VARCHAR(30)
    );
    CREATE UNIQUE INDEX athlete2_idx1 ON athlete2 (code, nation_code);
    CREATE INDEX athlete2_idx2 ON athlete2 (gender, nation_code);

Below two queries do the same behavior and they select index scan if the specified index, *athlete2_idx2*\'s scan cost is lower than sequential scan cost.

.. code-block:: sql

    SELECT /*+ RECOMPILE */ * 
    FROM athlete2 USE INDEX (athlete2_idx2) 
    WHERE gender='M' AND nation_code='USA';

    SELECT /*+ RECOMPILE */ * 
    FROM athlete2 
    WHERE gender='M' AND nation_code='USA'
    USING INDEX athlete2_idx2;

Below two queries do the same behavior and they always use *athlete2_idx2*

.. code-block:: sql
    
    SELECT /*+ RECOMPILE */ * 
    FROM athlete2 FORCE INDEX (athlete2_idx2) 
    WHERE gender='M' AND nation_code='USA';

    SELECT /*+ RECOMPILE */ * 
    FROM athlete2 
    WHERE gender='M' AND nation_code='USA'
    USING INDEX athlete2_idx2(+);

Below two queries do the same behavior and they always don't use *athlete2_idx2*

.. code-block:: sql
    
    SELECT /*+ RECOMPILE */ * 
    FROM athlete2 IGNORE INDEX (athlete2_idx2) 
    WHERE gender='M' AND nation_code='USA';

    SELECT /*+ RECOMPILE */ * 
    FROM athlete2 
    WHERE gender='M' AND nation_code='USA'
    USING INDEX athlete2_idx2(-);
    
Below query always do the sequential scan.

.. code-block:: sql

    SELECT * 
    FROM athlete2 
    WHERE gender='M' AND nation_code='USA'
    USING INDEX NONE;

    SELECT * 
    FROM athlete2
    WHERE gender='M' AND nation_code='USA'
    USING INDEX athlete2.NONE;
    
Below query forces to be possible to use all indexes except *athlete2_idx2* index.

.. code-block:: sql

    SELECT * 
    FROM athlete2 
    WHERE gender='M' AND nation_code='USA'
    USING INDEX ALL EXCEPT athlete2_idx2;
    
When two or more indexes have been specified in the **USING INDEX** clause, the query optimizer selects the proper one of the specified indexes.

.. code-block:: sql

    SELECT * 
    FROM athlete2 USE INDEX (athlete2_idx2, athlete2_idx1) 
    WHERE gender='M' AND nation_code='USA';

    SELECT * 
    FROM athlete2 
    WHERE gender='M' AND nation_code='USA'
    USING INDEX athlete2_idx2, athlete2_idx1;

When a query is run for several tables, you can specify a table to perform index scan by using a specific index and another table to perform sequential scan. The query has the following format.

.. code-block:: sql

    SELECT * 
    FROM tab1, tab2 
    WHERE ... 
    USING INDEX tab1.idx1, tab2.NONE;

When executing a query with the index hint syntax, the query optimizer considers all available indexes on the table for which no index has been specified. For example, when the *tab1* table includes *idx1* and *idx2* and the *tab2* table includes *idx3*, *idx4*, and *idx5*, if indexes for only *tab1* are specified but no indexes are specified for *tab2*, the query optimizer considers the indexes of *tab2*.

.. code-block:: sql

    SELECT ... 
    FROM tab1, tab2 USE INDEX(tab1.idx1) 
    WHERE ... ;
    
    SELECT ... 
    FROM tab1, tab2 
    WHERE ... 
    USING INDEX tab1.idx1;

The above query select the scan method of table *tab1* after comparing the cost between the sequential scan of the table *tab1* and the index scan of the index *idx1*, and select the scan method of table *tab2* after comparing the cost between the sequential scan of the table *tab2* and the index scan of the indexes *idx3*, *idx4*, *idx5*.
 
.. _tuning-index:

Optimization using indexes
========================== 

.. _covering-index:

Covering Index
--------------

The covering index is the index including the data of all columns in the **SELECT** list and the **WHERE**, **HAVING**, **GROUP BY**, and **ORDER BY** clauses.

You only need to scan the index pages, as the covering index contains all the data necessary for executing a query, and it also reduces the I/O costs as it is not necessary to scan the data storage any further. To increase data search speed, you can consider creating a covering index but you should be aware that the **INSERT** and the **DELETE** processes may be slowed down due to the increase in index size.

The rules about the applicability of the covering index are as follows:

*   If the covering index is applicable, you should use the CUBRID query optimizer first.
*   For the join query, if the index includes columns of the table in the **SELECT** list, use this index.
*   You cannot use the covering index if an index cannot be used.

.. code-block:: sql

    CREATE TABLE t (col1 INT, col2 INT, col3 INT);
    CREATE INDEX i_t_col1_col2_col3 ON t (col1,col2,col3);
    INSERT INTO t VALUES (1,2,3),(4,5,6),(10,8,9);

The following example shows that the index is used as a covering index because columns of both **SELECT** and **WHERE** condition exist within the index.

.. code-block:: sql

    -- csql> ;plan simple
    SELECT * FROM t WHERE col1 < 6;
    
::
    
    Query plan:
     Index scan(t t, i_t_col1_col2_col3, [(t.col1 range (min inf_lt t.col3))] (covers))
     
             col1         col2         col3
    =======================================
                1            2            3
                4            5            6

.. warning::

    If the covering index is applied when you get the values from the **VARCHAR** type column, the empty strings that follow will be truncated. If the covering index is applied to the execution of query optimization, the resulting query value will be retrieved. This is because the value will be stored in the index with the empty string being truncated.

    If you don't want this, use the **NO_COVERING_IDX** hint, which does not use the covering index function. If you use the hint, you can get the result value from the data area rather than from the index area.

    The following is a detailed example of the above situation. First, create a table with columns in **VARCHAR** types, and then **INSERT** the value with the same start character string value but the number of empty characters. Next, create an index in the column.

    .. code-block:: sql

        CREATE TABLE tab(c VARCHAR(32));
        INSERT INTO tab VALUES('abcd'),('abcd    '),('abcd ');
        CREATE INDEX i_tab_c ON tab(c);

    If you must use the index (the covering index applied), the query result is as follows:

    .. code-block:: sql

        -- csql>;plan simple
        SELECT * FROM tab WHERE c='abcd    ' USING INDEX i_tab_c(+);
        
    ::
    
        Query plan:
         Index scan(tab tab, i_tab_c, (tab.c='abcd    ') (covers))
         
         c
        ======================
        'abcd'
        'abcd'
        'abcd'

    The following is the query result when you don't use the index.

    .. code-block:: sql

        SELECT * FROM tab WHERE c='abcd    ' USING INDEX tab.NONE;
         
    ::
    
        Query plan:
         Sequential scan(tab tab)
         
         c
        ======================
        'abcd'
        'abcd    '
        'abcd '

    As you can see in the above comparison result, the value in the **VARCHAR** type retrieved from the index will appear with the following empty string truncated when the covering index has been applied.
    
.. note:: If covering index optimization is available to be applied, the I/O performance can be improved because the disk I/O is decreased. But if you don't want covering index optimization in a special condition, specify a **NO_COVERING_IDX** hint to the query. For how to add a query, refer :ref:`sql-hint`.

.. _order-by-skip-optimization:

Optimizing ORDER BY Clause
--------------------------

The index including all columns in the **ORDER BY** clause is referred to as the ordered index. Optimizing the query with **ORDER BY** clause is no need for the additional sorting process(skip order by), because the query results are searched by the ordered index. In general, for an ordered index, the columns in the **ORDER BY** clause should be located at the front of the index.

.. code-block:: sql

    SELECT * 
    FROM tab 
    WHERE col1 > 0 
    ORDER BY col1, col2

*   The index consisting of *tab* (*col1*, *col2*) is an ordered index.
*   The index consisting of *tab* (*col1*, *col2*, *col3*) is also an ordered index. This is because the *col3*, which is not referred by the **ORDER BY** clause comes after *col1* and *col2* .
*   The index consisting of *tab* (*col1*) is not an ordered index.
*   You can use the index consisting of *tab* (*col3*, *col1*, *col2*) or *tab* (*col1*, *col3*, *col2*) for optimization. This is because *col3* is not located at the back of the columns in the **ORDER BY** clause.

Although the columns composing an index do not exist in the **ORDER BY** clause, you can use an ordered index if the column condition is a constant.

.. code-block:: sql

    SELECT * 
    FROM tab 
    WHERE col2=val 
    ORDER BY col1,col3;

If the index consisting of *tab* (*col1*, *col2*, *col3*) exists and the index consisting of *tab* (*col1*, *col2*) do not exist when executing the above query, the query optimizer uses the index consisting of *tab* (*col1*, *col2*, *col3*) as an ordered index. You can get the result in the requested order when you execute an index scan, so you don't need to sort records.

If you can use the sorted index and the covering index, use the latter first. If you use the covering index, you don't need to retrieve additional data, because the data result requested is included in the index page, and you won't need to sort the result if you are satisfied with the index order.

If the query doesn't include any conditions and uses an ordered index, the ordered index will be used under the condition that the first column meets the **NOT NULL** condition.

.. code-block:: sql

    CREATE TABLE tab (i INT, j INT, k INT);
    CREATE INDEX i_tab_j_k on tab (j,k);
    INSERT INTO tab VALUES (1,2,3),(6,4,2),(3,4,1),(5,2,1),(1,5,5),(2,6,6),(3,5,4);

The following example shows that indexes consisting of *tab* (*j*, *k*) become sorted indexes and no separate sorting process is required because **GROUP BY** is executed by *j* and *k* columns.

.. code-block:: sql

    SELECT i,j,k 
    FROM tab 
    WHERE j > 0 
    ORDER BY j,k;

::
    
    --  the  selection from the query plan dump shows that the ordering index i_tab_j_k was used and sorting was not necessary
    --  (/* --> skip ORDER BY */)
    Query plan:
    iscan
        class: tab node[0]
        index: i_tab_j_k term[0]
        sort:  2 asc, 3 asc
        cost:  1 card 0
    Query stmt:
    select tab.i, tab.j, tab.k from tab tab where ((tab.j> ?:0 )) order by 2, 3
    /* ---> skip ORDER BY */
     
                i            j            k
    =======================================
                5            2            1
                1            2            3
                3            4            1
                6            4            2
                3            5            4
                1            5            5
                2            6            6

The following example shows that *j* and *k* columns execute **ORDER BY** and the index including all columns are selected so that indexes consisting of *tab* (*j*, *k*) are used as covering indexes; no separate process is required because the value is selected from the indexes themselves.

.. code-block:: sql

    SELECT /*+ RECOMPILE */ j,k 
    FROM tab 
    WHERE j > 0 
    ORDER BY j,k;

::
    
    --  in this case the index i_tab_j_k is a covering index and also respects the ordering index property.
    --  Therefore, it is used as a covering index and sorting is not performed.
     
    Query plan:
    iscan
        class: tab node[0]
        index: i_tab_j_k term[0] (covers)
        sort:  1 asc, 2 asc
        cost:  1 card 0
     
    Query stmt: select tab.j, tab.k from tab tab where ((tab.j> ?:0 )) order by 1, 2
    /* ---> skip ORDER BY */
     
                j            k
    ==========================
                2            1
                2            3
                4            1
                4            2
                5            4
                5            5
                6            6

The following example shows that *i* column exists, **ORDER BY** is executed by *j* and *k* columns, and columns that perform **SELECT** are *i*, *j*, and *k*. Therefore, indexes consisting of *tab* (*i*, *j*, *k*) are used as covering indexes; separate sorting process is required for **ORDER BY** *j*, *k* even though the value is selected from the indexes themselves.

.. code-block:: sql

    CREATE INDEX i_tab_j_k ON tab (i,j,k);
    SELECT /*+ RECOMPILE */ i,j,k 
    FROM tab 
    WHERE i > 0 
    ORDER BY j,k;

::
    
    -- since an index on (i,j,k) is now available, it will be used as covering index. However, sorting the results according to
    -- the ORDER BY  clause is needed.
    Query plan:
    temp(order by)
        subplan: iscan
                     class: tab node[0]
                     index: i_tab_i_j_k term[0] (covers)
                     sort:  1 asc, 2 asc, 3 asc
                     cost:  1 card 1
        sort:  2 asc, 3 asc
        cost:  7 card 1
     
    Query stmt: select tab.i, tab.j, tab.k from tab tab where ((tab.i> ?:0 )) order by 2, 3
     
                i            j            k
    =======================================
                5            2            1
                1            2            3
                3            4            1
                6            4            2
                3            5            4
                1            5            5
                2            6            6

.. note::
    Even if the type of a column in the ORDER BY clause is converted by using :func:`CAST`, ORDER BY optimization is executed when the sorting order is the same as before.
    
        +---------------------------------+
        | Before         | After          |
        +================+================+
        | numeric type   | numeric type   |
        +----------------+----------------+
        | string type    | string type    |
        +----------------+----------------+
        | DATETIME       | TIMESTAMP      |
        +----------------+----------------+
        | TIMESTAMP      | DATETIME       |
        +----------------+----------------+
        | DATETIME       | DATE           |
        +----------------+----------------+
        | TIMESTAMP      | DATE           |
        +----------------+----------------+
        | DATE           | DATETIME       |
        +----------------+----------------+

.. _index-descending-scan:

Index Scan in Descending Order
------------------------------

When a query is executed by sorting in descending order as follows, it usually creates a descending index. In this way, you do not have to go through addition procedure.

.. code-block:: sql

    SELECT * 
    FROM tab 
    [WHERE ...] 
    ORDER BY a DESC

However, if you create an ascending index and an descending index in the same column, the possibility of deadlock increases. In order to decrease the possibility of such case, CUBRID supports the descending scan only with ascending index. Users can use the **USE_DESC_IDX** hint to specify the use of the descending scan. If the hint is not specified, the following three query executions should be considered, provided that the columns listed in the **ORDER BY** clause can use the index.

*   Sequential scan + Sort in descending order
*   Scan in general ascending order + sort in descending
*   Scan in descending order that does not require a separate scan

Although the **USE_DESC_IDX** hint is omitted for the scan in descending order, the query optimizer decides the last execution plan of the three listed for an optimal plan.

.. note:: The **USE_DESC_IDX** hint is not supported for the join query.

.. code-block:: sql

    CREATE TABLE di (i INT);
    CREATE INDEX i_di_i on di (i);
    INSERT INTO di VALUES (5),(3),(1),(4),(3),(5),(2),(5);

The query will be executed as an ascending scan without **USE_DESC_IDX** hint.

.. code-block:: sql

    -- The query will be executed with an ascending scan. 
     
    SELECT  * 
    FROM di 
    WHERE i > 0 
    LIMIT 3;

::
    
    Query plan:
     
    Index scan(di di, i_di_i, (di.i range (0 gt_inf max) and inst_num() range (min inf_le 3)) (covers))
     
                i
    =============
                1
                2
                3

If you add **USE_DESC_IDX** hint to the above query, a different result will be shown by descending scan.

.. code-block:: sql

    -- We now run the following query, using the ''use_desc_idx'' SQL hint:
     
    SELECT /*+ USE_DESC_IDX */ * 
    FROM di 
    WHERE i > 0 
    LIMIT 3;
     
::

    Query plan:
     Index scan(di di, i_di_i, (di.i range (0 gt_inf max) and inst_num() range (min inf_le 3)) (covers) (desc_index))
     
                i
    =============
                5
                5
                5

The following example requires descending order by **ORDER BY** clause. In this case, there is no **USE_DESC_IDX** but do the descending scan.

.. code-block:: sql

    -- We also run the same query, this time asking that the results are displayed in descending order. 
    -- However, no hint is given. 
    -- Since ORDER BY...DESC clause exists, CUBRID will use descending scan, even though the hint is not given, 
    -- thus avoiding to sort the records.
     
    SELECT * 
    FROM di 
    WHERE i > 0 
    ORDER BY i DESC LIMIT 3;

::
    
    Query plan:
     Index scan(di di, i_di_i, (di.i range (0 gt_inf max)) (covers) (desc_index))
     
                i
    =============
                5
                5
                5

.. _group-by-skip-optimization:

Optimizing GROUP BY Clause
--------------------------

**GROUP BY** clause optimization works on the premise that if all columns in the **GROUP BY** clause are included in an index, you can use the index upon executing a query, so you don't execute a separate sorting job. 
The columns in the **GROUP BY** clause must exist in front side of the column forming the index.

.. code-block:: sql

    SELECT * 
    FROM tab 
    WHERE col1 > 0 
    GROUP BY col1,col2

*   You can use the index consisting of tab(col1, col2) for optimization.
*   The index consisting of tab(col1, col2, col3) can be used because col3 no referred by **GROUP BY** comes after col1 and col2.
*   You cannot use the index consisting of tab(col1) for optimization.
*   You also cannot use the index consisting of tab(col3, col1, col2) or tab(col1, col3, col2), because col3 is not located at the back of the column in the **GROUP BY** clause.

You can use the index if the column condition is a constant although the column consisting of the index doesn't exist in the **GROUP BY** clause.

.. code-block:: sql

    SELECT * 
    FROM tab 
    WHERE col2=val 
    GROUP BY col1,col3

If there is any index that consists of tab(col1, col2, col3) in the above example, use the index for optimizing **GROUP BY**.

Row sorting by **GROUP BY** is not required, because you can get the result as the requested order on the index scan.

If the index consisting of the **GROUP BY** column and the first column of the index is **NOT NULL**, even though there is no **WHERE** clause, the **GROUP BY** optimization will be applied.

**GROUP BY** optimization is applied only when **MIN** () or **MAX** () are used in an aggregate function, and to use the two aggregate functions together, an identical column must be used.

.. code-block:: sql

    CREATE INDEX i_T_a_b_c ON T(a, b, c);
    SELECT a, MIN(b), c, MAX(b) FROM T WHERE a > 18 GROUP BY a, b;

**Example**

.. code-block:: sql

    CREATE TABLE tab (i INT, j INT, k INT);
    CREATE INDEX i_tab_j_k ON tab (j, k);
    INSERT INTO tab VALUES (1,2,3),(6,4,2),(3,4,1),(5,2,1),(1,5,5),(2,6,6),(3,5,4);

The following example shows that indexes consisting of tab(j,k) are used and no separate sorting process is required because **GROUP BY** is executed by j and k columns.

.. code-block:: sql

    SELECT i,j,k 
    FROM tab 
    WHERE j > 0 
    GROUP BY j,k;
     
    --  the  selection from the query plan dump shows that the index i_tab_j_k was used and sorting was not necessary
    --  (/* ---> skip GROUP BY */)
     
::

    Query plan:
    iscan
        class: tab node[0]
        index: i_tab_j_k term[0]
        sort:  2 asc, 3 asc
        cost:  1 card 0
     
    Query stmt:
    select tab.i, tab.j, tab.k from tab tab where ((tab.j> ?:0 )) group by tab.j, tab.k
    /* ---> skip GROUP BY */
                i            j            k
                5            2            1
                1            2            3
                3            4            1
                6            4            2
                3            5            4
                1            5            5
                2            6            6

The following example shows that an index consisting of tab(j,k) is used and no separate sorting process is required while **GROUP BY** is executed by j and k columns, no condition exists for j, and j column has **NOT NULL** attribute.

.. code-block:: sql

    ALTER TABLE tab CHANGE COLUMN j j INT NOT NULL;
    
    SELECT * 
    FROM tab 
    GROUP BY j,k;
     
::

    --  the  selection from the query plan dump shows that the index i_tab_j_k was used (since j has the NOT NULL constraint )
    --  and sorting was not necessary (/* ---> skip GROUP BY */)
    Query plan:
    iscan
        class: tab node[0]
        index: i_tab_j_k
        sort:  2 asc, 3 asc
        cost:  1 card 0
     
    Query stmt: select tab.i, tab.j, tab.k from tab tab group by tab.j, tab.k
    /* ---> skip GROUP BY */
    === <Result of SELECT Command in Line 1> ===
                i            j            k
    =======================================
                5            2            1
                1            2            3
                3            4            1
                6            4            2
                3            5            4
                1            5            5
                2            6            6